The M.2 Max is an AI inference acceleration card powered by the Metis AIPU, designed to enable Large Language Models (LLMs) and Vision Language Models (VLMs) on power-constrained edge and embedded devices. It offers high memory performance in a small footprint and supports complex computer vision tasks using parallel or cascaded models.
Key features include:
- Memory capacities up to 16 GB with various cooling options.
- Support for standard and extended operating temperature ranges.
- Hardware Root-of-Trust for secure boot and firmware integrity.
- Integration via the Voyager SDK and advanced quantization tools.
- Compatibility with PCIe Gen. 3.0 x4, Intel, AMD, and Arm64 processors across Linux and Windows environments.
PrismML, a venture originating from Caltech, has introduced its new 1-bit large language model, Bonsai 8B, designed to significantly enhance AI efficiency on edge hardware. This innovative model architecture represents weights using only their sign and a shared scale factor, resulting in a memory footprint of just 1.15 GB. Compared to full-precision models, Bonsai 8B is 14 times smaller, 8 times faster, and 5 times more energy-efficient, while maintaining competitive performance. By drastically reducing memory and power requirements, PrismML aims to enable advanced AI applications on mobile devices, real-time robotics, and secure enterprise systems, effectively moving powerful language models out of massive cloud datacenters and onto local hardware.
NVIDIA has launched the Gemma 4 model family, designed to operate efficiently across a wide range of hardware, from data centers to edge devices like Jetson. This new generation includes the first Gemma MoE model and supports over 140 languages, enabling advanced capabilities like reasoning, code generation, and multimodal input.
Developers can fine-tune and deploy Gemma 4 using tools like NeMo Automodel and NVIDIA NIM, with commercial licensing available. The models are optimized for local deployment with frameworks such as vLLM, Ollama, and llama.cpp, offering flexibility for various use cases, including robotics, smart machines, and secure on-premise applications.
This paper proposes SpaceCoMP, a MapReduce-inspired processing model for LEO satellite mesh networks, addressing the challenge of downlink bandwidth limitations by processing data in orbit. It leverages orbital dynamics and proposes optimizations for routing and task scheduling to improve data processing efficiency.
This article details how to set up and use Machinechat JEDI with the Seeed Studio reTerminal DM for industrial IoT applications, including hardware/software preparation, installation, data pipeline creation, visualization, and MQTT integration.
Orange Pi has announced the Orange Pi AI Station, a compact edge computing platform featuring the Ascend 310 processor, offering up to 176 TOPS of AI compute performance with options for up to 96GB of LPDDR4X memory and NVMe storage.
This article details how to build a fast, offline AI chatbot using a Raspberry Pi 5, RLM AA50 accelerator card, and optimization techniques for speech recognition, natural language processing, and text-to-speech tasks.
A unified memory stack that functions as a memristor as well as a ferroelectric capacitor is reported, enabling both energy-efficient inference and learning at the edge.
This paper proposes SkyMemory, a LEO satellite constellation hosted key-value cache (KVC) to accelerate transformer-based inference, particularly for large language models (LLMs). It explores different chunk-to-server mapping strategies (rotation-aware, hop-aware, and combined) and presents simulation results and a proof-of-concept implementation demonstrating performance improvements.
The article introduces the concept of Federated Language Models, combining edge-based Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) for enhanced privacy and performance in AI applications.